Overview
Brought to you by YData
Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 6040 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 236.1 KiB |
| Average record size in memory | 40.0 B |
Variable types
| Numeric | 4 |
|---|---|
| Categorical | 1 |
Zip-code is highly skewed (γ1 = 77.57032144) | Skewed |
UserID is uniformly distributed | Uniform |
UserID has unique values | Unique |
Occupation has 711 (11.8%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-25 17:17:20.377665 |
|---|---|
| Analysis finished | 2025-07-25 17:18:53.958350 |
| Duration | 1 minute and 33.58 seconds |
| Software version | ydata-profiling vv4.16.1 |
| Download configuration | config.json |
Variables
UserID
Real number (ℝ)
Uniform  Unique 
| Distinct | 6040 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3020.5 |
| Minimum | 1 |
|---|---|
| Maximum | 6040 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 302.95 |
| Q1 | 1510.75 |
| median | 3020.5 |
| Q3 | 4530.25 |
| 95-th percentile | 5738.05 |
| Maximum | 6040 |
| Range | 6039 |
| Interquartile range (IQR) | 3019.5 |
Descriptive statistics
| Standard deviation | 1743.7421 |
|---|---|
| Coefficient of variation (CV) | 0.57730248 |
| Kurtosis | -1.2 |
| Mean | 3020.5 |
| Median Absolute Deviation (MAD) | 1510 |
| Skewness | 0 |
| Sum | 18243820 |
| Variance | 3040636.7 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6040 | 1 | < 0.1% |
| 1 | 1 | < 0.1% |
| 2 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 7 | 1 | < 0.1% |
| 6024 | 1 | < 0.1% |
| 6023 | 1 | < 0.1% |
| Other values (6030) | 6030 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 6040 | 1 | |
| 6039 | 1 | |
| 6038 | 1 | |
| 6037 | 1 | |
| 6036 | 1 | |
| 6035 | 1 | |
| 6034 | 1 | |
| 6033 | 1 | |
| 6032 | 1 | |
| 6031 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | F |
|---|---|
| 2nd row | M |
| 3rd row | M |
| 4th row | M |
| 5th row | M |
Common Values
| Value | Count | Frequency (%) |
| M | 4331 | |
| F | 1709 | 28.3% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| m | 4331 | |
| f | 1709 | 28.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 4331 | |
| F | 1709 | 28.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 6040 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 4331 | |
| F | 1709 | 28.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6040 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 4331 | |
| F | 1709 | 28.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6040 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| M | 4331 | |
| F | 1709 | 28.3% |
Age
Real number (ℝ)
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.639238 |
| Minimum | 1 |
|---|---|
| Maximum | 56 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 25 |
| median | 25 |
| Q3 | 35 |
| 95-th percentile | 56 |
| Maximum | 56 |
| Range | 55 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 12.895962 |
|---|---|
| Coefficient of variation (CV) | 0.42089694 |
| Kurtosis | -0.29081008 |
| Mean | 30.639238 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.24270008 |
| Sum | 185061 |
| Variance | 166.30583 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 25 | 2096 | |
| 35 | 1193 | |
| 18 | 1103 | |
| 45 | 550 | 9.1% |
| 50 | 496 | 8.2% |
| 56 | 380 | 6.3% |
| 1 | 222 | 3.7% |
| Value | Count | Frequency (%) |
| 1 | 222 | 3.7% |
| 18 | 1103 | |
| 25 | 2096 | |
| 35 | 1193 | |
| 45 | 550 | 9.1% |
| 50 | 496 | 8.2% |
| 56 | 380 | 6.3% |
| Value | Count | Frequency (%) |
| 56 | 380 | 6.3% |
| 50 | 496 | 8.2% |
| 45 | 550 | 9.1% |
| 35 | 1193 | |
| 25 | 2096 | |
| 18 | 1103 | |
| 1 | 222 | 3.7% |
Occupation
Real number (ℝ)
Zeros 
| Distinct | 21 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.1468543 |
| Minimum | 0 |
|---|---|
| Maximum | 20 |
| Zeros | 711 |
| Zeros (%) | 11.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3 |
| median | 7 |
| Q3 | 14 |
| 95-th percentile | 19 |
| Maximum | 20 |
| Range | 20 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 6.3295115 |
|---|---|
| Coefficient of variation (CV) | 0.77692705 |
| Kurtosis | -1.2141444 |
| Mean | 8.1468543 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.33829811 |
| Sum | 49207 |
| Variance | 40.062716 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=21)
| Value | Count | Frequency (%) |
| 4 | 759 | |
| 0 | 711 | |
| 7 | 679 | |
| 1 | 528 | 8.7% |
| 17 | 502 | 8.3% |
| 12 | 388 | 6.4% |
| 14 | 302 | 5.0% |
| 20 | 281 | 4.7% |
| 2 | 267 | 4.4% |
| 16 | 241 | 4.0% |
| Other values (11) | 1382 |
| Value | Count | Frequency (%) |
| 0 | 711 | |
| 1 | 528 | |
| 2 | 267 | 4.4% |
| 3 | 173 | 2.9% |
| 4 | 759 | |
| 5 | 112 | 1.9% |
| 6 | 236 | 3.9% |
| 7 | 679 | |
| 8 | 17 | 0.3% |
| 9 | 92 | 1.5% |
| Value | Count | Frequency (%) |
| 20 | 281 | |
| 19 | 72 | 1.2% |
| 18 | 70 | 1.2% |
| 17 | 502 | |
| 16 | 241 | |
| 15 | 144 | 2.4% |
| 14 | 302 | |
| 13 | 142 | 2.4% |
| 12 | 388 | |
| 11 | 129 | 2.1% |
Zip-code
Real number (ℝ)
Skewed 
| Distinct | 3403 |
|---|---|
| Distinct (%) | 56.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 87590.829 |
| Minimum | 231 |
|---|---|
| Maximum | 1.9312204 × 108 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.3 KiB |
Quantile statistics
| Minimum | 231 |
|---|---|
| 5-th percentile | 2714.95 |
| Q1 | 22314 |
| median | 55107 |
| Q3 | 89110 |
| 95-th percentile | 97214 |
| Maximum | 1.9312204 × 108 |
| Range | 1.9312181 × 108 |
| Interquartile range (IQR) | 66796 |
Descriptive statistics
| Standard deviation | 2485802.4 |
|---|---|
| Coefficient of variation (CV) | 28.37971 |
| Kurtosis | 6024.5333 |
| Mean | 87590.829 |
| Median Absolute Deviation (MAD) | 32900 |
| Skewness | 77.570321 |
| Sum | 5.2904861 × 108 |
| Variance | 6.1792134 × 1012 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 48104 | 19 | 0.3% |
| 22903 | 18 | 0.3% |
| 94110 | 17 | 0.3% |
| 55104 | 17 | 0.3% |
| 10025 | 16 | 0.3% |
| 55455 | 16 | 0.3% |
| 55105 | 16 | 0.3% |
| 48103 | 15 | 0.2% |
| 94114 | 15 | 0.2% |
| 55408 | 15 | 0.2% |
| Other values (3393) | 5876 |
| Value | Count | Frequency (%) |
| 231 | 1 | < 0.1% |
| 606 | 1 | < 0.1% |
| 681 | 1 | < 0.1% |
| 693 | 1 | < 0.1% |
| 918 | 1 | < 0.1% |
| 926 | 1 | < 0.1% |
| 961 | 1 | < 0.1% |
| 1002 | 5 | |
| 1003 | 1 | < 0.1% |
| 1020 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 193122042 | 1 | |
| 5849574 | 1 | |
| 2020010 | 1 | |
| 970025 | 1 | |
| 956456 | 1 | |
| 954025 | 1 | |
| 949702 | 1 | |
| 495321 | 2 | |
| 444555 | 1 | |
| 400060 | 1 |
Interactions
Correlations
| Age | Gender | Occupation | UserID | Zip-code | |
|---|---|---|---|---|---|
| Age | 1.000 | 0.047 | 0.079 | 0.012 | -0.010 |
| Gender | 0.047 | 1.000 | 0.240 | 0.063 | 0.000 |
| Occupation | 0.079 | 0.240 | 1.000 | -0.016 | 0.042 |
| UserID | 0.012 | 0.063 | -0.016 | 1.000 | -0.060 |
| Zip-code | -0.010 | 0.000 | 0.042 | -0.060 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| UserID | Gender | Age | Occupation | Zip-code | |
|---|---|---|---|---|---|
| 0 | 1 | F | 1 | 10 | 48067 |
| 1 | 2 | M | 56 | 16 | 70072 |
| 2 | 3 | M | 25 | 15 | 55117 |
| 3 | 4 | M | 45 | 7 | 02460 |
| 4 | 5 | M | 25 | 20 | 55455 |
| 5 | 6 | F | 50 | 9 | 55117 |
| 6 | 7 | M | 35 | 1 | 06810 |
| 7 | 8 | M | 25 | 12 | 11413 |
| 8 | 9 | M | 25 | 17 | 61614 |
| 9 | 10 | F | 35 | 1 | 95370 |
| UserID | Gender | Age | Occupation | Zip-code | |
|---|---|---|---|---|---|
| 6030 | 6031 | F | 18 | 0 | 45123 |
| 6031 | 6032 | M | 45 | 7 | 55108 |
| 6032 | 6033 | M | 50 | 13 | 78232 |
| 6033 | 6034 | M | 25 | 14 | 94117 |
| 6034 | 6035 | F | 25 | 1 | 78734 |
| 6035 | 6036 | F | 25 | 15 | 32603 |
| 6036 | 6037 | F | 45 | 1 | 76006 |
| 6037 | 6038 | F | 56 | 1 | 14706 |
| 6038 | 6039 | F | 45 | 0 | 01060 |
| 6039 | 6040 | M | 25 | 6 | 11106 |